HW1 Quat

Author

Amy Quarkume

1. Examination of Lung Capacity

LungCapDate

  1. Use the LungCapData to answer the following questions. (Hint: Using dplyr, especially group_by() and summarize() can help you answer the following questions relatively efficiently.)

Install Libraries

#install.packages("dplyr")
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
#install.packages("ggplot2")
library(ggplot2)
#install.packages("readxl")
library(readxl)
#install.packages("magrittr")
library(magrittr)
  1. What does the distribution of LungCap look like?

    The distribution of Lung Capacity in the data set looks normally distributed.

#histogram of LungCap
hist(LungCapData$LungCap, xlab = 'LungCap', main = '', freq = F)
Error in hist(LungCapData$LungCap, xlab = "LungCap", main = "", freq = F): object 'LungCapData' not found
  1. Compare the probability distribution of the LungCap with respect to Males and Females?

    Looking at the comparative boxplot males have a higher lung capacity than females.

boxplot(LungCapData$LungCap ~ LungCapData$Gender,
        col = c("#FFE0B2", "#FFA726"))
Error in eval(predvars, data, env): object 'LungCapData' not found

c. Compare the mean lung capacities for smokers and non-smokers. Does it make sense? In comparing the means, the lung capacity for smokers is higher than for nonsmokers.

#Mean Lung capacities of smokers
LungCapData %>%
  filter(Smoke == 'yes') %>%
  pull(LungCap) %>%
  mean()
Error in filter(., Smoke == "yes"): object 'LungCapData' not found
#Mean Lung capacities of non-smokers
LungCapData %>%
  filter(Smoke == 'no') %>%
  pull(LungCap) %>%
  mean()
Error in filter(., Smoke == "no"): object 'LungCapData' not found

d. Examine the relationship between Smoking and Lung Capacity within age groups: “less than or equal to 13”, “14 to 15”, “16 to 17”, and “greater than or equal to 18”.

#new var for Age Groups
LungCapData$Age_Cat <- cut(LungCapData$Age,
                           breaks = c(0,13,15,17,25),
                           labels = c('less than or equal to 13','14 to 15','16 to 17','greater than or equal to 18'))
Error in cut(LungCapData$Age, breaks = c(0, 13, 15, 17, 25), labels = c("less than or equal to 13", : object 'LungCapData' not found
ggplot(LungCapData, aes(x=Smoke, y=LungCap)) + 
    geom_boxplot() +
  facet_wrap(~Age_Cat, scale="free")
Error in ggplot(LungCapData, aes(x = Smoke, y = LungCap)): object 'LungCapData' not found

e. Compare the lung capacities for smokers and non-smokers within each age group. Is your answer different from the one in part c. What could possibly be going on here? We see an intervening relationship with age. Where most young children either don’t smoke ar all and have smaller lung capacities because of their size.

ggplot(LungCapData, aes(x=Smoke, y=LungCap)) + 
    geom_boxplot() +
  facet_wrap(~Age, scale="free")

--
  
Error: <text>:7:0: unexpected end of input
5: --
6:   
  ^

f.Calculate the correlation and correlation between Lung Capacity and Age. (use the cov() and cor() functions in R).

#correlation
LungCapData %>% 
  summarize(correlation = cor(LungCap, Age))
Error in summarize(., correlation = cor(LungCap, Age)): object 'LungCapData' not found
#correlation
LungCapData %>% 
  summarize(covariance = cov(LungCap, Age))
Error in summarize(., covariance = cov(LungCap, Age)): object 'LungCapData' not found

1. Examination of Prison Convictions

PrisonData

Data

PrisonData <- tibble(
  prior_convictions = c(0,1,2,3,4),
  freq = c(128,434,160,64,24))

PrisonData
# A tibble: 5 × 2
  prior_convictions  freq
              <dbl> <dbl>
1                 0   128
2                 1   434
3                 2   160
4                 3    64
5                 4    24
num <- sum (PrisonData$freq)
num
[1] 810
  1. What is the probability that a randomly selected inmate has exactly 2 prior convictions?
PrisonData %>% 
  filter(prior_convictions == 2) %>% 
  pull (freq) %>% 
  divide_by (num)
[1] 0.1975309

b. What is the probability that a randomly selected inmate has fewer than 2 prior convictions?

PrisonData %>% 
  filter(prior_convictions < 2) %>% 
  pull (freq) %>% 
  sum() %>%
  divide_by (num)
[1] 0.6938272

c. What is the probability that a randomly selected inmate has 2 or fewer prior convictions?

PrisonData %>% 
  filter(prior_convictions <= 2) %>% 
  pull (freq) %>% 
  sum() %>%
  divide_by (num)
[1] 0.891358

d.What is the probability that a randomly selected inmate has more than 2 prior convictions?

PrisonData %>% 
  filter(prior_convictions > 2) %>% 
  pull (freq) %>% 
  sum() %>%
  divide_by (num)
[1] 0.108642

e. What is the expected value for the number of prior convictions?

sum(prior_convictions*freq)
Error in eval(expr, envir, enclos): object 'prior_convictions' not found

f. Calculate the variance and the standard deviation for the Prior Convictions.

The echo: false option disables the printing of code (only output is displayed).